Adapting Morphology for Arabic Information Retrieval *

نویسندگان

  • Kareem Darwish
  • Douglas W. Oard
چکیده

This chapter presents an adaptation of existing techniques in Arabic morphology by leveraging corpus statistics to make them suitable for Information Retrieval (IR). The adaptation resulted in the development of Sebawai, an shallow Arabic morphological analyzer, and Al-Stem, an Arabic light stemmer. Both were used to produce Arabic index terms for Arabic experimentation. Sebawai is concerned with generating possible roots and stems of a given Arabic word along with probability estimates of deriving the word from each of the possible roots. The probability estimates were used as a guide to determine which prefixes and suffixes should be used to build the light stemmer Al-Stem. The use of the Sebawai generated roots and stems as index terms along with the stems from Al-Stem are evaluated in an information retrieval application and the results are compared

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Light Stemming for Arabic Information Retrieval

Computational Morphology is an urgent problem for Arabic Natural Language Processing, because Arabic is a highly inflected language. We have found, however, that a full solution to this problem is not required for effective information retrieval. Light stemming allows remarkably good information retrieval without providing correct morphological analyses. We developed several light stemmers for ...

متن کامل

Retrieving Arabic Printed Document: a Survey

This paper surveys some of the literature pertaining to searching and retrieving OCR’ed printed documents with emphasis on Arabic documents. It examines peculiarities of Arabic morphology, orthography, retrieval, word clustering, display, OCR, and error correction. The paper surveys existing evaluation test-beds for retrieval of Arabic OCR texts. Lastly, it concludes with possible directions fo...

متن کامل

Effective Stemming for Arabic Information Retrieval

Arabic has a very rich and complex morphology. Its appropriate morphological processing is very important for Information Retrieval (IR). In this paper, we propose a new stemming technique that tries to determine the stem of a word representing the semantic core of this word according to Arabic morphology. This method is compared to a commonly used light stemming technique which truncates a wor...

متن کامل

Developing a New System for Arabic Morphological Analysis and Generation

Arabic morphology poses special challenges to computational natural language processing systems. Its rich morphology and the highly complex word formation process of roots and patterns make computational approaches to Arabic very challenging. In this paper we present an approach for morphological analysis and generation of Modern Standard Arabic (MSA). Our approach is based on Arabic morphologi...

متن کامل

Design, Construction and Validation of an Arabic-English Conceptual Interlingua for Cross-lingual Information Retrieval

This paper describes the issues involved in extending a trans-lingual lexicon, the TextWise Conceptual Interlingua (CI), with Arabic terms. The Conceptual Interlingua is based on the Princeton English WordNet (Fellbaum, 1998). It is a central component in the cross-lingual information retrieval (CLIR) system CINDOR (Conceptual INterlingua for DOcument Retrieval). Arabic has a rich morphological...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007